[SPARK-25908][CORE][SQL] Remove old deprecated items in Spark 3#22921
[SPARK-25908][CORE][SQL] Remove old deprecated items in Spark 3#22921srowen wants to merge 6 commits intoapache:masterfrom
Conversation
There was a problem hiding this comment.
This was just a bug fix I spotted
R/pkg/R/generics.R
Outdated
There was a problem hiding this comment.
@felixcheung might want to check if I'm handling these R changes correctly
There was a problem hiding this comment.
my concern is that these are breaking changes in a version without having them deprecated first...
could we leave the old one to redirect and add .Deprecate?
There was a problem hiding this comment.
I think my comment didn't get connected to this one -- @felixcheung what do you think about the argument that this almost surely was meant to be deprecated along with counterparts in Scala/Python? leaving them in would make this inconsistent. As the degrees, radians, and approxCountDistinct are reasonably niche and have a direct replacement that's compatible with older versions, I feel like this is OK for 3.0?
There was a problem hiding this comment.
I think it's super light weight to have a approxCountDistinct that calls approx_count_distinct with deprecation?
I thought was that R API was not always sync or complete compare to python, and a breaking API change - ie. the job will fail - seems a bit drastic even in a major release.
There was a problem hiding this comment.
It is, but then again that's exactly what was deprecated and removed in Python and Scala. Major versions can have breaking changes. Yes R isn't always in sync but that's a bug not a feature? Let me surface this to dev@ as I think it's going to come up a few more times.
|
Test build #98353 has finished for PR 22921 at commit
|
There was a problem hiding this comment.
do we need to remove these? they are warnings for users if they set the wrong config right
There was a problem hiding this comment.
Yeah I can add them back. Wasn't sure whether they are still valuable or just old.
There was a problem hiding this comment.
Makes sense, will add it back. Yeah will leave this open a short while to make sure there is time to comment.
|
seems good to me; might want to leave this open for a few days so more people can take a look |
|
Test build #98354 has finished for PR 22921 at commit
|
|
Test build #98363 has finished for PR 22921 at commit
|
|
Test build #98373 has finished for PR 22921 at commit
|
There was a problem hiding this comment.
Can we rewrite this test case using select(explode()), like what we did in the following test cases?
There was a problem hiding this comment.
Yeah I'll try to bring back to the test case.
|
LGTM except a minor comment about the test case. Also we need to fix the PySpark test failure |
holdenk
left a comment
There was a problem hiding this comment.
Thanks for doing this pre Spark 3 cleanup work :)
R/pkg/R/functions.R
Outdated
There was a problem hiding this comment.
I'm confused about the since annotation here, where was the degrees implementation in 2.1.0? When I look at https://spark.apache.org/docs/latest/api/R/index.html I don't see the degrees function just toDegrees>=?
There was a problem hiding this comment.
degrees was added in Scala/Python in 2.1.0, which is what I was thinking of, but yeah really this must be since 3.0.0 right? I'll fix it.
There was a problem hiding this comment.
yes.. (version here is R API specific)
R/pkg/R/functions.R
Outdated
There was a problem hiding this comment.
Similar comment with degrees
python/pyspark/sql/functions.py
Outdated
There was a problem hiding this comment.
Looks like the removal of this is causing the test failure, maybe do a grep for approxCountDistinct in the tests?
There was a problem hiding this comment.
Yeah, in some cases the deprecated user-facing method was named the same way as some internal method and I changed the wrong one. I'll investigate.
python/pyspark/storagelevel.py
Outdated
There was a problem hiding this comment.
cc @MLnick I know this was a thing on your radar in some way for dataframe caching maybe? Do we actually want to remove this for 3+?
|
Test build #98414 has finished for PR 22921 at commit
|
|
Looks okay to me too but I'd also leave this open for few more days. |
R/pkg/R/generics.R
Outdated
There was a problem hiding this comment.
my concern is that these are breaking changes in a version without having them deprecated first...
could we leave the old one to redirect and add .Deprecate?
R/pkg/R/functions.R
Outdated
There was a problem hiding this comment.
degrees and radians will need to be added to NAMESPACE file for export
R/pkg/R/functions.R
Outdated
There was a problem hiding this comment.
it's actually new in R for 3.0.0 then
There was a problem hiding this comment.
Right, will fix that one too if I missed it, per #22921 (comment)
|
Yeah it's a good point that these weren't deprecated, but I assume they should have been. Same change, same time, same logic. given that it's a reasonably niche method, I thought it would be best to go ahead and be consistent here? |
|
Test build #98433 has finished for PR 22921 at commit
|
|
Test build #98480 has finished for PR 22921 at commit
|
srowen
left a comment
There was a problem hiding this comment.
@felixcheung ready for another look. I retained the existing methods but deprecated them, and directed them to the newer methods in the JVM, because the old ones are gone. Not sure I got it 100% right.
|
Test build #98536 has finished for PR 22921 at commit
|
|
Test build #98537 has finished for PR 22921 at commit
|
| #' head(select(df, approx_count_distinct(df$gear, 0.02))) | ||
| #' head(select(df, countDistinct(df$gear, df$cyl))) | ||
| #' head(select(df, n_distinct(df$gear))) | ||
| #' head(distinct(select(df, "gear")))} |
There was a problem hiding this comment.
we only need one set - they both are @rdname column_aggregate_functions so will duplicate all other examples
There was a problem hiding this comment.
Thanks, @HyukjinKwon fixed this. Pending tests, does the change look OK to you on the R side @felixcheung ?
Fix CRAN check failure at 22921
R/pkg/R/functions.R
Outdated
| #' @aliases toRadians toRadians,Column-method | ||
| #' @note toRadians since 1.4.0 | ||
| setMethod("toRadians", | ||
| signature(x = "Column"), |
|
Test build #98551 has finished for PR 22921 at commit
|
|
Test build #98561 has finished for PR 22921 at commit
|
|
Test build #4419 has finished for PR 22921 at commit
|
|
Merged to master |
## What changes were proposed in this pull request? - Remove some AccumulableInfo .apply() methods - Remove non-label-specific multiclass precision/recall/fScore in favor of accuracy - Remove toDegrees/toRadians in favor of degrees/radians (SparkR: only deprecated) - Remove approxCountDistinct in favor of approx_count_distinct (SparkR: only deprecated) - Remove unused Python StorageLevel constants - Remove Dataset unionAll in favor of union - Remove unused multiclass option in libsvm parsing - Remove references to deprecated spark configs like spark.yarn.am.port - Remove TaskContext.isRunningLocally - Remove ShuffleMetrics.shuffle* methods - Remove BaseReadWrite.context in favor of session - Remove Column.!== in favor of =!= - Remove Dataset.explode - Remove Dataset.registerTempTable - Remove SQLContext.getOrCreate, setActive, clearActive, constructors Not touched yet - everything else in MLLib - HiveContext - Anything deprecated more recently than 2.0.0, generally ## How was this patch tested? Existing tests Closes apache#22921 from srowen/SPARK-25908. Lead-authored-by: Sean Owen <sean.owen@databricks.com> Co-authored-by: hyukjinkwon <gurwls223@apache.org> Co-authored-by: Sean Owen <srowen@gmail.com> Signed-off-by: Sean Owen <sean.owen@databricks.com>
| override def session(sparkSession: SparkSession): this.type = super.session(sparkSession) | ||
|
|
||
| // override for Java compatibility | ||
| override def context(sqlContext: SQLContext): this.type = super.session(sqlContext.sparkSession) |
There was a problem hiding this comment.
@srowen This public method seems had not been deprecated before removal, and is avaiable in 2.4.5.
scala> import org.apache.spark.ml.util.GeneralMLWriter
import org.apache.spark.ml.util.GeneralMLWriter
scala> new GeneralMLWriter(null).context(spark.sqlContext)
res0: org.apache.spark.ml.util.GeneralMLWriter = org.apache.spark.ml.util.GeneralMLWriter@26b150cd
There is no deprecation warning above. Does it matter?
There was a problem hiding this comment.
This seems properly deprecated in MLWriter as its parent.
https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.ml.util.GeneralMLWriter@context(sqlContext:org.apache.spark.sql.SQLContext):GeneralMLWriter.this.type
and the Scaladoc explicitly shows this at GeneralMLWriter too. Seems right to remove together if we should in MLWriter.
There was a problem hiding this comment.
Yeah it was deprecated in 2.0.0 and marked for removal in 3.0.0.
/**
* Sets the Spark SQLContext to use for saving/loading.
*
* @deprecated Use session instead. This method will be removed in 3.0.0.
*/
@Since("1.6.0")
@deprecated("Use session instead. This method will be removed in 3.0.0.", "2.0.0")
I think ideally the Java overload and subclass overrides would be marked deprecated too, but they implicitly are. If there were a case that this is actually used, we could revive it, but just wondering how often people would be using save + SQLContext?
There was a problem hiding this comment.
I never use this method, just check it. Thanks!
What changes were proposed in this pull request?
Not touched yet
How was this patch tested?
Existing tests